This sub-chapter shows an analysis of salary for different occupations in New York City.
In order to have an overview of salary distribution according to different occupations in New York City, we draw a Cleveland Dot Plot to show the 10-year-average salary of different occupations first.
ggplot(NewSalary) +
geom_point(aes(Salary_occupation, reorder(Occupations,Salary_occupation)),color = "royalblue3", size = 2, alpha = 0.75) + ylab('Occupation') + xlab('Salary') +
ggtitle('Average Salaries of Different Occupations in NYC') +
scale_x_continuous(labels = ks)+
mytheme
Observations on Average Salaries by Occupation in NYC:
Legal, which has much higher average salary compare with all other occupations.Health diagnosing and treating practitioners and other technical, which has the second largest salary.Computer and mathematical and Management. They have very similar average salary, the difference in salary of the two occupations is only 810.Law enforcement workers, Business and financial operations and Architecture and engineering. The salary range of this sub-group is also small, it is NA.Life, physical, and social scienceArts, design, entertainment, sports, and media, Health technologists and technicians, Education, training, and library, Installation, maintenance, and repair, Community and social service and Construction and extraction. The salary range of this group is 7128.Office and administrative support, Sales and related, Transportation and Fire fighting and prevention. The salary range of this group is 3932.Production, Healthcare support, Building and grounds cleaning and maintenance, Material moving, Personal care and service, Farming, fishing, and forestry, Food preparation and serving related. The salary range of this group is 8016.One of the things that affects salary is time. So first, we have an analysis on salaries in different years.
NewSalary1 <- NewSalary[NewSalary$year == "year2010" | NewSalary$year == "year2015" | NewSalary$year == "year2019",]
NewSalary1$year<- plyr::revalue(NewSalary1$year, c("year2010" = "2010","year2015"="2015","year2019"="2019"))
ggplot(NewSalary1) +
geom_point(aes(Salary_year, reorder(Occupations,Salary_avg), color = year),size = 2, alpha = 0.75) + ylab('Occupation') + xlab('Salary') +
ggtitle('Salaries of Different Occupations of Different Times in NYC')+
scale_x_continuous(labels = ks)+
mytheme
Observations on Salaries by Occupation NYC through 2010-2019:
Top 3
1.) Legal occupations
This occupation is in an monotonous increasing trend in salary, and the increasing speed is also becoming faster.
2.) Health diagnosing and treating practitioners and other technical occupations
This occupation also has a monotonous increasing trend in salary.
3.) Computer and mathematical occupations
This occupation also has a monotonous increasing trend in salary.
Last 3
1.) Food preparation and serving related occupations
This occupation has the lowest salary within the year range in 2010. However, its salary is in an increasing trend by years.
2.) Farming, fishing, and forestry occupations
The salary for this occupation decreased first and then increased. However, it still did not reach the salary level in 2019 as it was in 2010.
3.) Personal care and service occupations
The salary trend for this occupation also decreased first and then increased. Different from the occupation of farming, fishing, and forestry occupations, the salary only dropped a little bit first and then increased a lot. Therefore, generally speaking, the salary of this occupation increased.
In order to see the salary variances of the 25 occupations in detail, we draw boxplot for comparisons.
ggplot(NewSalaryWithVariance) +
geom_boxplot(aes(y = reorder(x = Occupations, Salary_YearlyAvg, FUN = median), x = Salary_YearlyAvg),
color = "black", fill = "dark red", alpha = 0.7) +
scale_x_continuous(labels = ks)+
ggtitle("Boxplots with Salaries for Different Occupations") +
xlab("Salary")+
ylab("Occupations")
mytheme2
## List of 5
## $ axis.title :List of 11
## ..$ family : NULL
## ..$ face : chr "bold"
## ..$ colour : NULL
## ..$ size : num 12
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ axis.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : num 10
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.text :List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : num 10
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ legend.title:List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : NULL
## ..$ size : num 12
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ plot.title :List of 11
## ..$ family : NULL
## ..$ face : chr "bold"
## ..$ colour : NULL
## ..$ size : num 15
## ..$ hjust : NULL
## ..$ vjust : NULL
## ..$ angle : NULL
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
Observations from Boxplot of salaries per year:
Salaries that seem prone to dramatic fluctuation over time:
1.) Legal occupations
2.) Computer and Mathematical occupations
3.) Architecture and engineering occupations
4.) Life, physical, and social science
5.) Farming, fishing, and forestry occupations
Salaries where the number of employees has fluctuated very little:
1.) Law enforcement
2.) Health technologists and technicians
3.) Installation, maintenance, and repair
4.) Community and social service
5.) Healthcare support
6.) Material moving
When reading this plot is that the salaries in any particular occupation may change the viewers perception as to what constitutes more variation. To address this, a second plot has been created in which the box plots are normalized by dividing the salaries in each year by the mean across this sector. While the variation won’t be apparent, the relative degrees of fluctuation will become more apparent.
NewSalaryWithVariance2 <- NewSalaryWithVariance %>% select(1,2,5)%>%unique()
NewSalaryWithVariance2$Normalized<-NewSalaryWithVariance2$Salary_YearlyAvg/NewSalaryWithVariance2$Salary_occupation
ggplot(NewSalaryWithVariance2) +
geom_boxplot(aes(x = Normalized, y = reorder(x = Occupations,Normalized, FUN = median)),
color = "black", fill = "dark red", alpha = 0.7) +
ggtitle("Normalized salary in sector per year") +
xlab("Salary") + ylab("Occupations") +
mytheme1
Observations from salaries in sector per year:
In this plot, the total values of employees working in each field in each year are normalized by their averages. There are something special you can see from this normalized boxplot.
1.) Farming, fishing and forestry ranks first in variation in salaries. This makes sense because this occupation has relatively low salaries, so when normalized, it is more likely to show high variation in salary by years.
2.) Construction and extraction also has huge variance, even when the average salary for this occupation is not very low. There are two outliers in this sector, in 2010, it has the lowest salary which is 26284, and in 2018, it has the highest salary which is 56175. Without the two data points, the variance will be much smaller. There might be more buildings under construction in 2018 and much fewer in 2010.
3.) Healthcare support and Law enforcement workers are the two most stable sectors in salaries. Among all the occupations, Healthcare support and Law enforcement workers has the smallest variance after normalized. Of all the occupation types, it seems that law enforcement occupations have the most stable salaries. This might be because the salaries for these kinds of jobs are set by the local government, and government jobs tend to have stable and consistent pay.
Usually, we tend to think salaries should have increasing trends by years. From the plot, however, we can discover the salaries do not have an increasing trend for all types of occupations. Among all types of occupations, only two types of occupations have lower salaries in the 2019 than in 2010 to 2013, namely, farming, fishing and forestry occupations and healthcare support occupations. Generally speaking, there are two different trends of salaries, namely, a monotonous increasing trend, which includes 18 occupations, and the trend that first decreases then increases, which includes 7 occupations.
There are kinds of occupations in this trend group. 1. Legal occupations 2. Health diagnosing and treating practitioners and other technical occupations 3. Computer and mathematical occupations 4. Management occupations 5. Business and financial operations occupations 6. Architecture and engineering occupations 7. Arts, design, entertainment, sports, and media occupations 8. Education, training, and library occupations 9. Installation, maintenance, and repair occupations 10. Community and social service occupations 11. Construction and extraction occupations 12. Office and administrative support occupations 13. Sales and related occupations 14. Transportation occupations 15. Production occupations 16. Building and grounds cleaning and maintenance occupations 17. Material moving occupations 18. Food preparation and serving related occupations
In this category, the salaries of some occupations decreased a lot and then increased a little, which makes these categories have a decreasing trend in general. However, for other occupations, the salaries decreased a little first and then increased a lot. For these occupations, they are in an increasing trend in general. To learn more about the exact changing trends of salaries in this group, we draw scatter plots to analyze the trends of these occupations in detail.
NotMonotonous <- NewSalary[NewSalary$Occupations == "Law enforcement workers" |
NewSalary$Occupations == "Life, physical, and social science"|
NewSalary$Occupations == "Health technologists and technicians"|
NewSalary$Occupations == "Fire fighting and prevention"|
NewSalary$Occupations == "Healthcare support"|
NewSalary$Occupations == "Personal care and service"|
NewSalary$Occupations == "Farming, fishing, and forestry",]
NotMonotonous$year <- as.character(NotMonotonous$year)
NotMonotonous$Occupations <- factor(NotMonotonous$Occupations,levels = c("Law enforcement workers",
"Life, physical, and social science",
"Health technologists and technicians",
"Fire fighting and prevention",
"Healthcare support",
"Personal care and service",
"Farming, fishing, and forestry"))
NotMonotonous1 <- NotMonotonous
NotMonotonous1$year<- plyr::revalue(NotMonotonous1$year, c("year2010" = "2010","year2011"="2011","year2012"="2012","year2013"="2013","year2014"="2014","year2015"="2015","year2016"="2016","year2017"="2017","year2018"="2018","year2019"="2019"))
ggplot(NotMonotonous1,aes(year,Salary_YearlyAvg,group = Occupations)) +
geom_line(size = 1,color = "black") +
geom_point(color = "royalblue3", size = 2) +
facet_wrap(~Occupations,ncol=2,scales = "free_y") +
scale_y_continuous(labels = ks)+
theme(axis.text=element_text(size=8),
axis.title=element_text(size=15,face="bold"),
strip.text.x = element_text(
size = 12, face = "bold.italic"
),
strip.text.y = element_text(
size = 12, color = "red", face = "bold.italic"
),
plot.title = element_text(size = 18, face = "bold"))+
ylab('Salary') + xlab('Occupations') + ggtitle('Salaries of Different Years by Occupations in NYC')
Observations from sectors without clear salary trends:
From the above plots, we see both the variations in salaries and the general trends for these occupations.
1. Law enforcement workers including supervisors
For this occupation, the salary is in the trend of a wave. The crests are in year 2011, year 2015, and year 2019. The troughs are in year 2012 and year 2016. Besides, there are two special points for this occupation.
a) The salary dropped a lot from 2015 to 2016, and then returned to normal quickly from 2016 to 2017.
b) The salary was in an decreasing trend from 2017 to 2018, but it did not continue to decrease, instead, it increased a lot from year 2018 to 2019.
2. Life, physical, and social science occupations
For this occupation, the salary is in also in the trend of a wave. At the same time, it is in an increasing trend in general. The crests are in year 2013 and 2018. The troughs are in year 2012 and 2014.
3. Health technologists and technicians
From year 2010 to 2013, the salary of this occupation grew slightly at a steady rate. However, from 2014 to 2015, the salary had a sudden drop. After that, the salary started to increase at a higher rate.
4. Fire fighting and prevention, and other protective service workers including supervisors
The salary of this group is in a waving trend and remains at a certain level in general. The crests are in 2012, 2013 and 2017, and the troughs are in 2011 and 2016. However, there is a special point for this occupation.
a) The salary had a sudden increase from year 2016 to year 2017.
5. Healthcare support occupations
Generally speaking, the salary trend is in a decreasing trend. The crests occurred in year 2010, 2013, and 2015. The troughs occured in year 2014 and 2017.
6. Personal care and service occupations
Generally speaking, the salary of this occupation is in an increasing trend. It remained relatively stable before 2015, after that, the salary increased at a relatively high speed.
7. Farming, fishing, and forestry occupations
For this occupation, it salary increased a little from 2011 to 2012, and then began to decrease at a high speed from 2012 to 2014. After that, the salary recovered at a lower but steady speed.
It is also very important to analyze on the variations of salaries of different occupations. Because different occupations have different base wages, sometimes it might be more meaningful to calculate the percentage of wage fluctuations in wages. Here, we use the average wages to represent the wage of different occupations.
# draw cleveland dot plot according to variance
NewSalaryWithVariance$year <- as.factor(NewSalaryWithVariance$year)
NewSalaryWithVariance$Occupations <- as.factor(NewSalaryWithVariance$Occupations)
NewSalaryWithVariance1 <- NewSalaryWithVariance[NewSalaryWithVariance[, "year"] == "year2010",]
## Warning: The `i` argument of ``[`()` can't be a matrix as of tibble 3.0.0.
## Convert to a vector.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
NewSalaryWithVariance1 <- NewSalaryWithVariance1 %>%
mutate(variance_pct = (variance/Salary_YearlyAvg))
NewSalaryWithVariance1 <- NewSalaryWithVariance1
NewSalaryWithVariance1 <- NewSalaryWithVariance1%>%select(1,6) %>% unique()
ggplot(NewSalaryWithVariance1,aes(x=fct_reorder(Occupations, abs(variance_pct)), y = variance_pct)) +
geom_col(fill = "royalblue3",alpha = 0.75)+
coord_flip()+
theme(axis.text=element_text(),
axis.title=element_text(face="bold"),
plot.title = element_text(face = "bold"))+ xlab('Occupations') + ylab('Percentage Difference') + ggtitle('Percentage Difference of Salaries over Years')+
mytheme
Observations from percentage difference of salary via gender:
As we can see from the above plot, we can discover that the majority of these occupations have increased in salaries in the past decade. Only two of these categories have decreased in salaries. Among all occupations, the occupation of Construction and extraction occupations has the biggest Percentage Difference in salary in 2010 and 2019, and the occupation of Healthcare support occupations have the smallest.
ggplot(NewSalary) +
geom_point(aes(Salary_county, reorder(Occupations,Salary_county), color = Boroughs),size = 2, alpha = 0.75) + ylab('Occupation') + xlab('Salary') +
scale_x_continuous(labels = ks)+
ggtitle('Salaries of Different Occupations of Different Counties in NYC')+
mytheme
Observations on Salaries by Occupation NYC for different counties:
As can be seen in this plot, for different occupations, the counties with the highest and lowest wages in each occupation are different. For the majority of the occupations, the highest salaries are in New York County and their lowest salaries are in Bronx County. For the relatively low-paid occupations, the highest salaries are in Richmond County.
We draw a dodged bar chart to reflect the specific distribution data of the highest and lowest wages in different counties.
## `summarise()` regrouping output by 'Boroughs' (override with `.groups` argument)
Observations on the Distribution of the Highest and Lowest wages in Different Counties:
As can be seen from this plot, many occupations have the lowest salaries in Bronx County, some of them are in Kings County and New York County, but none of them appears in Queens County and Richmond County.
For the occupations with highest salaries, the majority of them are in New York County and Richmond County. Several of them also appear in Queens County, but none of them appear in Bronx County and Kings County.
To see whether there are changes of the distribution of highest and lowest wages in different countries, we draw a stacked bar chart by years. We use different colors to represent different counties.
## `summarise()` regrouping output by 'index', 'year' (override with `.groups` argument)
Observations on the Probability of having the Highest/Lowest Salaries in Different Counties:
As can be seen in this plot, from the perspective of each year alone, the situation is slightly different from the overall average, which is reflected in the following aspects.
In the overall trend, maximum salary for all occupations do not lie in Bronx County and Kings County. However, as can be seen from the stacked bar chart, in year 2010, 2012, 2013 and 2014, there are some occupations with highest salary in Bronx County. Also, except for year 2014 and year 2017, there are some occupations with highest salary in Kings County.
In the overall trend, minimum salary for all occupations do not lie in Queens County and Richmond County. However, as can be seen from the stacked bar chart, except for year 2013, there are some occupations with lowest salary in Queens County. Also, except for year 2018, there are some occupations with lowest salary in Richmond County.
We also discover that the variations among different boroughs for different occupations are different. Therefore, we use a bar chart to order the degree of variance among different boroughs for all types of occupations. For each occupation, we use the salary in five counties to minus the smallest salary, add them up and divide the sum by 5. Then, we divide the value by the smallest salary to represent the variance of each occupation.
CountySalary3 <- CountySalary %>% select(1:5)
CountySalary3$variance <- with(CountySalary3, Salary_county - MinValue)
CountySalary3 <- CountySalary3 %>%
group_by(Occupations) %>%
mutate(variance_pct = (sum(variance)/5)/Salary_occupation) %>%
ungroup() %>% select(1,7) %>% unique()
ggplot(CountySalary3,aes(x=fct_reorder(Occupations, variance_pct), y = variance_pct)) +
geom_col(fill ="royalblue3",alpha = 0.75)+
ylab('Percentage Difference') + xlab('Occupations') +
coord_flip()+
ggtitle('Percentage Difference of Salaries in Different Counties') +
mytheme
Observations on Percentage Difference of Salaries in Different Counties:
1.) Sales and related occupation
2.) Legal occupations
3.) Management occupations
4.) Farming, fishing, and forestry occupations
5.) Arts, design, entertainment, sports, and media occupations
1.) Personal care and service occupations
2.) Health technologists and technicians
3.) Community and social service occupations
4.) Food preparation and serving related occupations
5.) Life, physical, and social science occupations
ggplot(NewSalary) +
geom_point(aes(Salary_gender, reorder(Occupations,Salary_gender), color = Gender),size = 2, alpha = 0.75) + ylab('Occupation') + xlab('Salary') +
scale_x_continuous(labels = ks)+
ggtitle('Salaries of Different Occupations of Different Genders in NYC') +
scale_color_manual(values=c('seagreen3','mediumorchid'))+
mytheme
Observations on the Salaries of Different Occupations of Different Genders in NYC:
As can be seen in this Cleveland dot plot, the salaries of some occupations varies a lot between different genders, while some other occupations have similar salaries for two genders. Also, for some kinds of occupations, male have higher salaries and for other kinds of occupations, woman have higher salaries. To have a deeper understanding of these characteristics, we have a deeper analysis on salaries for different genders in different occupations.
We use a bar chart to order the salary variance between genders for different occupations. To quantify the difference, we divide the income difference between male and female by the average salary of the occupation.
GenderSalary <- NewSalary %>% select(1, 2, 10,13)
GenderSalary <- unique(GenderSalary)
GenderSalary <- pivot_wider(GenderSalary, names_from = "Gender", values_from = "Salary_gender")
GenderSalary$variance <- with(GenderSalary, Male - Female)
GenderSalary <- GenderSalary %>%
mutate(variance_pct = variance/Salary_occupation)
ggplot(GenderSalary,aes(x=fct_reorder(Occupations, abs(variance_pct)), y = variance_pct)) +
geom_col(fill = "royalblue3",alpha = 0.75)+
coord_flip()+
theme(axis.text=element_text(),
axis.title=element_text(face="bold"),
plot.title = element_text(face = "bold"))+ xlab('Percentage Difference') + ylab('Occupations') + ggtitle('Percentage Difference of Salaries for Different Genders')+
mytheme
Observations on the Percentage Difference of Salaries for Different Genders:
From the horizontal bar chart above, we discover the following characteristics.
For most of the occupations, male employees have higher salaries than female employees. Female employees only have higher salaries in 4 kinds of occupations among the 25 kinds of occupations, namely, Construction and extraction occupations, Installation, maintenance, and repair occupations, Community and social service occupations, and Transportation occupations.
1.) Sales and related occupations
2.) Building and grounds cleaning and maintenance occupations
3.) Material moving occupations
4.) Production occupations
5.)Personal care and service occupations
1.) Transportation occupations
2.) Office and administrative support occupations
3.) Community and social service occupations
4.) Installation, maintenance, and repair occupations
5.) Computer and mathematical occupations
####Relation between Percentage Differenct of Employment and Salary in Gender
Intuitively, the gender composition of employees in a profession is related to the level of wages for gender. We want to analyze if this intuition makes sense. Therefore, we use two categorical variables to represent the two characteristics, namely “Gender Distribution” and “Salary Distribution”. For the category of “Gender Distribution”, there are two values, Male-dominated, which means there are more male employees in this occupation than female employees, and Female-dominated, which means there are more female employees in this occupation than male employees. For the category of “Salary Distribution”, we also set two values, Male-higher, which means male employees have higher salary in this occupation, and Female-higher, which means female employees have higher salary in this occupation. Then, we draw a mosaic plot to measure the relation.
From this mosaic plot, we can see that salary distribution is related to gender composition. However, the characteristic of this connection is against tuition. We tend to think that in the “Female-higher” salary distribution group, there will be more female-dominated occupations, and in the “Male-higher” salary distribution group, there will be more male-dominated occupations. However, the conclusion from the plot is opposite against our tuition.